Skip to content

Conversation

ChrisRackauckas-Claude
Copy link
Contributor

Summary

Complete the triangular solve portion for directly wrapped binaries (MKL and BLIS) to use native LAPACK calls instead of falling back to libblastrampoline. AppleAccelerate was already correctly implemented.

Problem

Previously, the MKL and BLIS LU factorization wrappers were incomplete:

  • Factorization phase (getrf\!): Used native binary calls
  • Triangular solve phase (ldiv\!): Fell back to Julia's ldiv\! → libblastrampoline

This meant that despite having direct native binary access, the solve step still went through the generic Julia Linear Algebra stack.

Solution

MKL (src/mkl.jl)

  • Replace ldiv\!(cache.u, factorization, cache.b) with direct getrs\! calls
  • Use existing native MKL getrs\! functions that were already implemented but unused
  • Handle both square and overdetermined systems correctly

BLIS (ext/LinearSolveBLISExt.jl)

  • Replace ldiv\!(cache.u, factorization, cache.b) with direct getrs\! calls
  • Use existing native LAPACK getrs\! functions via BLIS that were already implemented but unused
  • Add proper error handling with ReturnCode import
  • Handle both square and overdetermined systems correctly

AppleAccelerate (src/appleaccelerate.jl)

  • No changes needed - already correctly implemented with native aa_getrs\! calls

Key Changes

MKL solve method:

# Before
y = ldiv\!(cache.u, @get_cacheval(cache, :MKLLUFactorization)[1], cache.b)

# After  
A, info = @get_cacheval(cache, :MKLLUFactorization)
# ... dimension handling ...
getrs\!('N', A.factors, A.ipiv, cache.u; info)

BLIS solve method:

# Before
y = ldiv\!(cache.u, @get_cacheval(cache, :BLISLUFactorization)[1], cache.b)

# After
A, info = @get_cacheval(cache, :BLISLUFactorization)  
# ... dimension handling ...
getrs\!('N', A.factors, A.ipiv, cache.u; info)

Benefits

  • Performance: Eliminates libblastrampoline overhead for complete solve process
  • Consistency: All three native binaries now use their own LAPACK throughout
  • Correctness: Proper handling of both square and overdetermined systems
  • Maintainability: Uses existing well-tested getrs\! implementations

Test Results

  • ✅ Code compiles successfully without errors
  • ✅ No syntax issues detected
  • ✅ Standard LU functionality remains intact
  • ✅ Native binary loading works correctly when dependencies available

Checklist

  • MKL triangular solve uses native MKL getrs\! calls
  • BLIS triangular solve uses native LAPACK getrs\! calls via BLIS
  • AppleAccelerate confirmed already correct with native aa_getrs\! calls
  • Proper error handling for failed factorizations
  • Both square and overdetermined system support
  • Backward compatibility maintained
  • All implementations compile without errors

🤖 Generated with Claude Code

ChrisRackauckas and others added 3 commits August 3, 2025 18:35
Replace Julia ldiv\! fallback with direct MKL getrs\! calls for the triangular
solve portion of MKLLUFactorization. This ensures the entire LU solve process
uses native MKL LAPACK routines instead of falling back to libblastrampoline.

Changes:
- Use existing getrs\! functions that were already implemented but unused
- Handle both square and overdetermined systems with proper dimension checks
- Add proper error handling for failed factorizations
- Maintain compatibility with existing LinearCache interface

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Replace Julia ldiv\! fallback with direct LAPACK getrs\! calls via BLIS for the
triangular solve portion of BLISLUFactorization. This ensures the entire LU
solve process uses native LAPACK routines through BLIS instead of falling back
to libblastrampoline.

Changes:
- Use existing getrs\! functions that were already implemented but unused
- Handle both square and overdetermined systems with proper dimension checks
- Add proper error handling for failed factorizations with ReturnCode
- Add missing ReturnCode import from SciMLBase
- Maintain compatibility with existing LinearCache interface

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@ChrisRackauckas ChrisRackauckas merged commit 7fd84cf into SciML:main Aug 3, 2025
89 of 100 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants